Social Determinants in Diabetes Prevalence and Management

BMIN503/EPID600 Final Project

Author

Karen Tang


1 Overview

The project is to analyze the role of social determinants in diabetes prevalence and managment. The social determinants will be explore in this project are race, income, insurance coverage, and clinic proximity. The goal is to understand how these factors influence the prevalence and management of diabetes, providing insights that can inform public strategies, healthcare policies, and intervention programs.

I spoke to Dr. Richard Tsui about my project, he guided me to choose a specific social determinants that directly correlates to the disease I want to learn more about.

2 Introduction

According to the CDC, in 2020, 38.4 million people in the United States of all ages had diabetes. Diabetes was the eighth leading cause of death in the United States. In an article called “Overview of Social Determinants of Health in the Development of Diabetes” from the Diabetes Journals stated that diabetes has a long-standing, well-documented socioeconomic and racial/ethnic inequalities in disease prevalence and incidence, morbidity and mortality. Higher diabetes prevalence is associated with lower education, lower income, and non-White race/ethnicity.

World Health Organization (WHO) Commission defined Social Determinants of Health (SDOH) as “the conditions in which people are born, grow, live, work and age, and the wider set of forces and systems shaping the conditions of daily life”. SDOH attributed between 30%-55% of health outcome and they viewed as the main driver of avoidable health inequities. Due to the association between social determinants of health and diabetes, I would like to learn and conduct an analysis on the following factors: race, income, and insurance, distance to clinics.

3 Methods

The datasets I used are the following:

1.Diabetes = the dataset is uploaded to this repository under the name of “ExportCSV.csv”, it is state level survey data of the year 2022 from the Behavioral Risk Surveillance System (BRFSS).

2. Social determinants = the dataset is a county level data of Census Tract 2020 from the Social Determinants of Health Database. Since the file is too big to upload here, I filtered the datset by only Pennsylvania and West Virginia and export it as a csv file. The dataset is uploaded to this repository under the name of “PA_WV_social”

3. Geospatial Data from the tigris package in Rstudio.

#Loading the necessary packages
library(readxl)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   3.5.1     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(sf)
Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(tigris)
To enable caching of data, set `options(tigris_use_cache = TRUE)`
in your R script or .Rprofile.
library(leaflet)
library(maps)

Attaching package: 'maps'

The following object is masked from 'package:purrr':

    map
library(readr)
library(knitr)
library(RColorBrewer)
#Loading diabetes dataset
diabetes <- read_csv("C:/Users/nghik/Downloads/ExportCSV.csv")
Rows: 235 Columns: 28
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (21): LocationAbbr, LocationDesc, Class, Topic, Indicator, Response, Dat...
dbl  (7): ID, Year, Low_Confidence_Limit, High_Confidence_Limit, Sample_Size...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Only choosing dataset for PA and WV
PA_WV_diabetes <- diabetes %>%
  filter(LocationDesc %in% c("Pennsylvania", "West Virginia"))

#Having the variable Data_Value as numeric in order to graph/compare later
diabetes$Data_Value <- as.numeric(diabetes$Data_Value)
Warning: NAs introduced by coercion
#Looking for the state with the highest diabetes diagnosis, excluding U.S. islands
most_diabetes <- diabetes %>%
  filter(Response == "Yes", 
         !grepl("median", LocationDesc, ignore.case = TRUE), 
         !LocationDesc %in% c("Guam", "Puerto Rico", "Virgin Islands")) %>%
  select(LocationDesc, Response, Data_Value) %>%
  arrange(desc(Data_Value)) %>%
  head(1)

# Print the result
summary(most_diabetes)
 LocationDesc         Response           Data_Value  
 Length:1           Length:1           Min.   :17.4  
 Class :character   Class :character   1st Qu.:17.4  
 Mode  :character   Mode  :character   Median :17.4  
                                       Mean   :17.4  
                                       3rd Qu.:17.4  
                                       Max.   :17.4  

From the output above, the state with the highest population diagnosed with diabetes is West Virginia.

Showing the percentage of population with diabetes from Pennsylvania and West Virginia

print(PA_WV_diabetes %>%
        filter(Response == "Yes") %>%
        select(LocationDesc, Response, Data_Value))
# A tibble: 2 × 3
  LocationDesc  Response Data_Value
  <chr>         <chr>    <chr>     
1 Pennsylvania  Yes      11.5      
2 West Virginia Yes      17.4      

Providing a visualization on the percentage of population with diabetes based on all states in the United States

#Diabetes = yes data 
diabetes_yes <- diabetes %>%
  filter(Response == "Yes", 
         !grepl("median", LocationDesc, ignore.case = TRUE), 
         !LocationDesc %in% c("Guam", "Puerto Rico", "Virgin Islands"))

#Download counties data on every states in U.S.
counties1 <- counties(cb = TRUE, class = "sf")
Retrieving data for the year 2022

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |                                                                      |   1%
  |                                                                            
  |=                                                                     |   1%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |====                                                                  |   6%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |=====                                                                 |   8%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |===========                                                           |  15%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  19%
  |                                                                            
  |==============                                                        |  21%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |==================                                                    |  26%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  34%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |==========================                                            |  36%
  |                                                                            
  |==========================                                            |  38%
  |                                                                            
  |===========================                                           |  39%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  42%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |===============================                                       |  45%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |=================================                                     |  48%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |=====================================                                 |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  54%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  59%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  61%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |==============================================                        |  66%
  |                                                                            
  |===============================================                       |  66%
  |                                                                            
  |===============================================                       |  68%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  74%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  81%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |=============================================================         |  88%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  94%
  |                                                                            
  |===================================================================   |  95%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |===================================================================== |  99%
  |                                                                            
  |======================================================================| 100%
#Joining diabetes and counties to map
diabetes_map <- inner_join(diabetes_yes, counties1, by = c("LocationAbbr" = "STUSPS"))

#Making the diabetes as sf data to map
diabetes_map <- st_as_sf(diabetes_map)

#Create color palette based on the percentage of diabetes 
pal <- colorNumeric(
  palette = "YlOrRd",
  domain = diabetes_map$Data_Value
)
# Create a popup showing diabetes data
map_data <- diabetes_map %>%
  mutate(popup_info = paste0(
    "<b>State:</b> ", LocationAbbr, "<br>",
    "<b>White:</b> ", Data_Value, "<br>"
  ))

# Generate the map
diabetes_by_state <- leaflet(data = map_data) %>%
  addTiles() %>%
  addPolygons(
    fillColor = ~pal(Data_Value), 
    fillOpacity = 0.5,
    color = "black",
    weight = 1,
    popup = ~popup_info
  ) %>%
  fitBounds(
    lng1 = min(st_bbox(diabetes_map)$xmin), # Minimum longitude
    lat1 = min(st_bbox(diabetes_map)$ymin), # Minimum latitude
    lng2 = max(st_bbox(diabetes_map)$xmax), # Maximum longitude
    lat2 = max(st_bbox(diabetes_map)$ymax)  # Maximum latitude
  )
Warning: sf layer has inconsistent datum (+proj=longlat +datum=NAD83 +no_defs).
Need '+proj=longlat +datum=WGS84'
diabetes_by_state

Let’s make a histogram of diabetes in both West Virginia and Pennsylvania to see the difference of diabetes prevalence between the two states.

#Making histogram
diabetes_state <- ggplot(data = PA_WV_diabetes, aes(x = Response, y = Data_Value, fill = LocationDesc)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_viridis_d(name = "States", option = "C") + # "C" is one of the viridis color options
  labs(
    title = "Diabetes Numbers Based on States",
    x = "Diagnosis", y = "Percentage (%)"
  )
  
diabetes_state

Based on the histogram output, the percentage of No-diabetes is significantly higher than diabetes diagnosis. This means that people are being tested for diabetes and most tests are coming back negative. This can be due to people visiting their doctor regularly to have routine checkups.

Next, I will be preparing the datasets are analysis - this following steps were done beforehand to minimize file size because I had a hard time rendering the file

Loading in the social determinants file social_determinants read_excel(“C:/Users/nghik/Downloads/sdoh_2020_tract_1_0.xlsx”, sheet = 2)

Selecting data from just Pennsylvania and West Virginia PA_WV_social social_determinants %>% filter(STATE == c(“Pennsylvania”, “West Virginia”))

writing the file and send it to my computer write.csv(PA_WV_social, “C:/Users/nghik/Downloads/PA_WV_social”)

The next steps are done to complete the analysis after the data clean up above

#Pulling PA_WV_social dataset in from my computer
PA_WV_social <- read_csv("C:/Users/nghik/Downloads/PA_WV_social")
New names:
• `` -> `...1`
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 1996 Columns: 330
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr   (3): STATE, COUNTY, REGION
dbl (325): ...1, YEAR, TRACTFIPS, COUNTYFIPS, STATEFIPS, TERRITORY, ACS_TOT_...
lgl   (2): ACS_MEDIAN_HH_INC_AIAN, ACS_MEDIAN_HH_INC_NHPI

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#Selecting variables that will be use 
PA_WV_social_final <- PA_WV_social %>%
  select(c("TRACTFIPS", "STATE", "COUNTYFIPS", "STATEFIPS", #States identifiers
           "ACS_PCT_WHITE", "ACS_PCT_AIAN", "ACS_PCT_ASIAN", "ACS_PCT_BLACK", "ACS_PCT_HISPANIC", "ACS_PCT_NHPI", "ACS_PCT_WHITE", #race/ethnicity
           "ACS_PCT_MEDICAID_ANY","ACS_PCT_MEDICARE_ONLY","ACS_PCT_PRIVATE_ANY",  "ACS_PCT_PUBLIC_ONLY","ACS_PCT_UNINSURED",#Insurance
           "ACS_PCT_HH_INC_10000", "ACS_PCT_HH_INC_14999", "ACS_PCT_HH_INC_24999", "ACS_PCT_HH_INC_49999", "ACS_PCT_HH_INC_99999", "ACS_PCT_HH_INC_100000", #Income
           "POS_DIST_CLINIC_TRACT")) #Distance to the nearest clinic

#Downloading data with sf data in order to map
counties <- counties(state = c("WV", "PA"), cb = TRUE, class = "sf") #FIPS CODE: 54 = West Virginia, and 42 = Pennsylvania
Retrieving data for the year 2022
summary(counties)
   STATEFP            COUNTYFP           COUNTYNS           AFFGEOID        
 Length:122         Length:122         Length:122         Length:122        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
    GEOID               NAME             NAMELSAD            STUSPS         
 Length:122         Length:122         Length:122         Length:122        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
  STATE_NAME            LSAD               ALAND               AWATER         
 Length:122         Length:122         Min.   :2.140e+08   Min.   :3.027e+05  
 Class :character   Class :character   1st Qu.:9.347e+08   1st Qu.:5.447e+06  
 Mode  :character   Mode  :character   Median :1.296e+09   Median :1.134e+07  
                                       Mean   :1.460e+09   Mean   :3.186e+07  
                                       3rd Qu.:1.986e+09   3rd Qu.:2.049e+07  
                                       Max.   :3.183e+09   Max.   :1.966e+09  
          geometry  
 MULTIPOLYGON :122  
 epsg:4269    :  0  
 +proj=long...:  0  
                    
                    
                    

With all the data ready, let’s dive into specific analysis. I will be making a race dataframe to have data on just data on race and conduct a histogram to see the break down of race from both states.

#Making a race dataframe
race <- PA_WV_social_final %>%
  select("TRACTFIPS", "STATE", "COUNTYFIPS", "STATEFIPS", #States identifiers
         "ACS_PCT_WHITE", #Percentage of population reported as White
         "ACS_PCT_AIAN", #Percentage of population reported as American Indian and Alaska Native
         "ACS_PCT_ASIAN", #Percentage of population reported as Asian
         "ACS_PCT_BLACK", #Percentage of population reported as Black
         "ACS_PCT_HISPANIC", #Percentage of population reported as Hispanic
         "ACS_PCT_NHPI" #Percentage of population reported as Native Hawaiian and Pacific Islander
         )

#Average from each race from both states
race_summary <- race %>%
  group_by(STATE) %>%
  summarise(Average_White = mean(ACS_PCT_WHITE, na.rm = TRUE),
            Average_AIAN = mean(ACS_PCT_AIAN, na.rm = TRUE),
            Average_Asian = mean(ACS_PCT_ASIAN, na.rm = TRUE),
            Average_Black = mean (ACS_PCT_BLACK, na.rm = TRUE),
            Average_Hispanic = mean (ACS_PCT_HISPANIC, na.rm = TRUE),
            Average_NHPI = mean (ACS_PCT_NHPI, na.rm = TRUE))
summary(race_summary)
    STATE           Average_White    Average_AIAN    Average_Asian   
 Length:2           Min.   :79.05   Min.   :0.1622   Min.   :0.7633  
 Class :character   1st Qu.:82.36   1st Qu.:0.1634   1st Qu.:1.3747  
 Mode  :character   Median :85.68   Median :0.1646   Median :1.9860  
                    Mean   :85.68   Mean   :0.1646   Mean   :1.9860  
                    3rd Qu.:88.99   3rd Qu.:0.1658   3rd Qu.:2.5973  
                    Max.   :92.30   Max.   :0.1671   Max.   :3.2086  
 Average_Black    Average_Hispanic  Average_NHPI     
 Min.   : 3.876   Min.   :1.535    Min.   :0.008934  
 1st Qu.: 5.827   1st Qu.:2.998    1st Qu.:0.014351  
 Median : 7.778   Median :4.460    Median :0.019769  
 Mean   : 7.778   Mean   :4.460    Mean   :0.019769  
 3rd Qu.: 9.729   3rd Qu.:5.922    3rd Qu.:0.025187  
 Max.   :11.680   Max.   :7.385    Max.   :0.030604  

Making the histogram of race

state_long <- race_summary %>%
  pivot_longer(
    cols = starts_with("Average_"),    # Columns to pivot
    names_to = "Race",             # New column for race
    values_to = "Percentage"       # New column for values
  ) %>%
  mutate(Race = gsub("Mean_", "", Race))  # Clean up race names

race_graph <- ggplot(state_long, aes(x = Race, y = Percentage, fill = STATE)) +
  geom_bar(stat = "identity", position = "dodge") + 
  theme_minimal() +
  labs(
    title = "Racial Demographics Comparison by State",
    x = "Race Category",
    y = "Percentage (%)",
    fill = "State"
  ) +
  scale_fill_viridis_d(option = "C", direction = -1) +  
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
race_graph

Based on the graph above, it looks like Pennsylvania is far more ethnical and racial diverse comparing to West Virginia, which could influence diabetes and healthcare access. The diversity in PA may lead to varies health risks and disparties, while WV, with more homogeneous population may face different but significant healthcare challenges. Next step, I will look into income as lower income often linked to poorer health outcomes.

Merging counties data and all interest variables to make maps

#Removing the two first number in COUNTY FIPS
counties$GEOID <- as.numeric(counties$GEOID)
#Joining the two dataset
map_need <- inner_join(counties, PA_WV_social_final, by = c("GEOID" = "COUNTYFIPS"))
summary(map_need)
   STATEFP            COUNTYFP           COUNTYNS           AFFGEOID        
 Length:1996        Length:1996        Length:1996        Length:1996       
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
     GEOID           NAME             NAMELSAD            STUSPS         
 Min.   :42001   Length:1996        Length:1996        Length:1996       
 1st Qu.:42029   Class :character   Class :character   Class :character  
 Median :42079   Mode  :character   Mode  :character   Mode  :character  
 Mean   :43702                                                           
 3rd Qu.:42105                                                           
 Max.   :54109                                                           
                                                                         
  STATE_NAME            LSAD               ALAND               AWATER         
 Length:1996        Length:1996        Min.   :2.140e+08   Min.   :3.027e+05  
 Class :character   Class :character   1st Qu.:9.577e+08   1st Qu.:1.142e+07  
 Mode  :character   Mode  :character   Median :1.565e+09   Median :2.161e+07  
                                       Mean   :1.538e+09   Mean   :6.080e+07  
                                       3rd Qu.:2.069e+09   3rd Qu.:3.705e+07  
                                       Max.   :3.183e+09   Max.   :1.966e+09  
                                                                              
   TRACTFIPS            STATE             STATEFIPS     ACS_PCT_WHITE   
 Min.   :4.200e+10   Length:1996        Min.   :42.00   Min.   :  0.00  
 1st Qu.:4.203e+10   Class :character   1st Qu.:42.00   1st Qu.: 76.35  
 Median :4.208e+10   Mode  :character   Median :42.00   Median : 90.25  
 Mean   :4.370e+10                      Mean   :43.64   Mean   : 80.88  
 3rd Qu.:4.211e+10                      3rd Qu.:42.00   3rd Qu.: 96.17  
 Max.   :5.411e+10                      Max.   :54.00   Max.   :100.00  
                                                        NA's   :20      
  ACS_PCT_AIAN    ACS_PCT_ASIAN    ACS_PCT_BLACK    ACS_PCT_HISPANIC
 Min.   :0.0000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
 1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.: 0.510   1st Qu.: 0.900  
 Median :0.0000   Median : 0.770   Median : 2.515   Median : 2.610  
 Mean   :0.1664   Mean   : 2.872   Mean   :10.606   Mean   : 6.580  
 3rd Qu.:0.0425   3rd Qu.: 3.310   3rd Qu.:10.053   3rd Qu.: 6.133  
 Max.   :7.9300   Max.   :42.640   Max.   :98.190   Max.   :88.870  
 NA's   :20       NA's   :20       NA's   :20       NA's   :20      
  ACS_PCT_NHPI     ACS_PCT_MEDICAID_ANY ACS_PCT_MEDICARE_ONLY
 Min.   :0.00000   Min.   : 0.000       Min.   : 0.000       
 1st Qu.:0.00000   1st Qu.: 7.798       1st Qu.: 3.530       
 Median :0.00000   Median :14.250       Median : 5.050       
 Mean   :0.02762   Mean   :17.662       Mean   : 5.474       
 3rd Qu.:0.00000   3rd Qu.:23.927       3rd Qu.: 6.862       
 Max.   :2.30000   Max.   :81.130       Max.   :38.000       
 NA's   :20        NA's   :24           NA's   :24           
 ACS_PCT_PRIVATE_ANY ACS_PCT_PUBLIC_ONLY ACS_PCT_UNINSURED ACS_PCT_HH_INC_10000
 Min.   :  0.00      Min.   : 0.00       Min.   : 0.000    Min.   : 0.00       
 1st Qu.: 53.45      1st Qu.:14.19       1st Qu.: 2.598    1st Qu.: 2.36       
 Median : 65.54      Median :21.18       Median : 4.600    Median : 4.63       
 Mean   : 63.24      Mean   :24.27       Mean   : 5.663    Mean   : 6.55       
 3rd Qu.: 75.12      3rd Qu.:31.20       3rd Qu.: 7.385    3rd Qu.: 8.61       
 Max.   :100.00      Max.   :81.13       Max.   :46.990    Max.   :52.72       
 NA's   :24          NA's   :24          NA's   :24        NA's   :27          
 ACS_PCT_HH_INC_14999 ACS_PCT_HH_INC_24999 ACS_PCT_HH_INC_49999
 Min.   :  0.000      Min.   : 0.000       Min.   : 0.00       
 1st Qu.:  1.640      1st Qu.: 5.350       1st Qu.:16.12       
 Median :  3.520      Median : 8.760       Median :21.86       
 Mean   :  4.582      Mean   : 9.576       Mean   :21.91       
 3rd Qu.:  6.160      3rd Qu.:12.750       3rd Qu.:27.17       
 Max.   :100.000      Max.   :48.570       Max.   :52.27       
 NA's   :27           NA's   :27           NA's   :27          
 ACS_PCT_HH_INC_99999 ACS_PCT_HH_INC_100000 POS_DIST_CLINIC_TRACT
 Min.   :  0.00       Min.   :  0.00        Min.   : 0.030       
 1st Qu.: 24.88       1st Qu.: 14.33        1st Qu.: 1.010       
 Median : 30.79       Median : 23.94        Median : 2.820       
 Mean   : 30.21       Mean   : 27.17        Mean   : 4.264       
 3rd Qu.: 35.78       3rd Qu.: 37.03        3rd Qu.: 6.122       
 Max.   :100.00       Max.   :100.00        Max.   :26.460       
 NA's   :27           NA's   :27                                 
          geometry   
 MULTIPOLYGON :1996  
 epsg:4269    :   0  
 +proj=long...:   0  
                     
                     
                     
                     

Let’s move to income now

income <- PA_WV_social_final %>%
  select(TRACTFIPS, STATE, COUNTYFIPS, STATEFIPS, #State identidiers
         ACS_PCT_HH_INC_10000,#percentage of household with income less than $10,000
         ACS_PCT_HH_INC_14999, #Percentage of household with income $10,000 and $14,999
         ACS_PCT_HH_INC_24999, #Percentage of household with income $15,000 and $24,999
         ACS_PCT_HH_INC_49999, #Percentage of household with income $25,000 and $49,999
         ACS_PCT_HH_INC_99999, #Percentage of houshold with income more than $100,000
         ACS_PCT_HH_INC_100000) #Percentage of household with income $50,000 and $99,999

#Average from each race from both states
income_summary <- income %>%
  group_by(STATE) %>%
  summarise(Average_10000 = mean(ACS_PCT_HH_INC_10000, na.rm = TRUE),
            Average_10000_14999= mean(ACS_PCT_HH_INC_14999, na.rm = TRUE),
            Average_15000_24999 = mean(ACS_PCT_HH_INC_24999, na.rm = TRUE),
            Average_25000_44999 = mean (ACS_PCT_HH_INC_49999, na.rm = TRUE),
            Average_50000_99999 = mean (ACS_PCT_HH_INC_99999, na.rm = TRUE),
            Average_100000 = mean (ACS_PCT_HH_INC_100000, na.rm = TRUE))

income_long <- income_summary %>%
  pivot_longer(
    cols = starts_with("Average_"),    # Columns to pivot
    names_to = "Income",             # New column for race
    values_to = "Percentage"       # New column for values
  ) %>%
  mutate(Income = gsub("Mean_", "", Income))  # Clean up race names

income_graph <- ggplot(income_long, aes(x = Income, y = Percentage, fill = STATE)) +
  geom_bar(stat = "identity", position = "dodge") +  # Grouped bars
  theme_minimal() +
  labs(
    title = "Income Comparison by State",
    x = "Income Category",
    y = "Percentage (%)",
    fill = "State"
  ) +
  scale_fill_viridis_d(option = "C", direction = -1) +  
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

income_graph

With the graph above, it looks like Pennsylvania households have higher salary comparing to West Virginia, which may overall distribute to better access to healthcare and diabetes management. This income disparity often linked to difference in insurance coverage, as higher income are more likely to result in better access to private insurance or employer-sponsored plans, which may be the factor that reduce care barriers for PA residents compared to those in WV.

insurance <- PA_WV_social_final %>%
  select(TRACTFIPS, STATE, COUNTYFIPS, STATEFIPS, #State identidiers
         ACS_PCT_MEDICAID_ANY,
         ACS_PCT_MEDICARE_ONLY,
         ACS_PCT_PRIVATE_ANY,
         ACS_PCT_PUBLIC_ONLY,
         ACS_PCT_UNINSURED)

#Average from each race from both states
insurance_summary <- insurance %>%
  group_by(STATE) %>%
  summarise(AverageMedicaid = mean(ACS_PCT_MEDICAID_ANY, na.rm = TRUE),
            AverageMedicare= mean(ACS_PCT_MEDICARE_ONLY, na.rm = TRUE),
            AveragePrivate = mean(ACS_PCT_PRIVATE_ANY, na.rm = TRUE),
            AveragePublic = mean (ACS_PCT_PUBLIC_ONLY, na.rm = TRUE),
            AverageUninsured = mean (ACS_PCT_UNINSURED, na.rm = TRUE))
insurance_summary
# A tibble: 2 × 6
  STATE         AverageMedicaid AverageMedicare AveragePrivate AveragePublic
  <chr>                   <dbl>           <dbl>          <dbl>         <dbl>
1 Pennsylvania             16.8            5.28           64.7          23.1
2 West Virginia            22.9            6.70           54.3          31.7
# ℹ 1 more variable: AverageUninsured <dbl>
insurance_long <- insurance_summary %>%
  pivot_longer(
    cols = starts_with("Average"),    # Columns to pivot
    names_to = "Insurance",             # New column for race
    values_to = "Percentage"       # New column for values
  ) %>%
  mutate(Insurance = gsub("Mean_", "", Insurance))  # Clean up race names

insurance_graph <- ggplot(insurance_long, aes(x = Insurance, y = Percentage, fill = STATE)) +
  geom_bar(stat = "identity", position = "dodge") +  # Grouped bars
  theme_minimal() +
  labs(
    title = "Insurance Comparison by State",
    x = "Insurance Category",
    y = "Percentage (%)",
    fill = "State"
  ) +
  scale_fill_viridis_d(option = "C", direction = -1) +  
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

insurance_graph

With the graph above, it looks like almost everyone in both West Virginia and Pennsylvania has insurance, more specifically private insurance and only a small percentage of the population are uninsured. This might correlates to the high percentage of “No” in diabetes prevalence because people with insurance are more likely to have routine checkups comparing to people who are uninsured.

Next, I would like to examine whether there are near-by clinics in both states as access to healthcare facillities may also influence individuals’ ability to receive routine checkups and diabetes management.

#Create color palette based on the white population
pal <- colorRampPalette(brewer.pal(9, "YlOrRd"))(100)

# Create the map using ggplot
clinic_distance <- ggplot(map_need) +
  geom_sf(aes(fill = POS_DIST_CLINIC_TRACT)) +  
  scale_fill_gradientn(colors = pal, name = "Distance from Clinic") +  
  theme_minimal() +
  labs(title = "Distance from Clinic by County in Pennsylvania and West Virginia")  

clinic_distance

4 Results

The analysis shows that Pennsylvania have more income and more diverse Both West Virginia and Pennsylvania have high rates of insurance coverage with only a small percentage of the population uninsured. The higher income and insurance coverage is likely linked to more frequent routine checkups, which can improve diabetes management.

5 Conclusion

Section 4

In conclusion, this analysis highlights the significant role of social determinants, such as race, income, insurance coverage and access to healthcare shaping diabetes outcomes in both Pennsylvania and West Virginia. While higher income and insurance coverage in both states can associated with more frequent routine checkups, the racial and ethnic diversity in Pennsylvania may introduce additional complexities in healthcare access outcomes. The proximity of healthcare clinics also play a key role, with access varying across regions, highlighting the need for targeted interventions and address these factors to reduce health disparities and improve diabetes management for all populations.

Some limitations of this report includes the Social Determinants Database provides valuable information on social determinants of health, even at the county level, the BRFSS does not offer county-level data on diabetes prevalence. This limitation restricts the ability to perform a more granular analysis of diabetes rates across different regions within Pennsylvania and West Virginia. A more detailed county-level analysis would provide a clearer understanding of how diabetes prevalence may vary based on geographic location and population distribution, allowing for more targeted insights and recommendations.

##Future Directions The future directions for the projects are conducting analysis on additional social determinants like education, housing, and food security to deepen our understand on the outcomes. A multi-year analysis also can be done to understand the patterns of the disease throughout the years. A predictive model can be done to analyze and see the patterns to allocate our resources for diabetes management.

##Reference Social Determinants of Health Database | Agency for Healthcare Research and Quality. Available at: https://www.ahrq.gov/sdoh/data-analytics/sdoh-data.html.

BRFSS prevalence & trends data: Explore by topic Centers for Disease Control and Prevention. Available at: https://nccd.cdc.gov/BRFSSPrevalence/rdPage.aspx?rdReport=DPH_BRFSS.ExploreByTopic&irbLocationType=StatesAndMMSA&islClass=CLASS03&islTopic=TOPIC18&islYear=2023&rdRnd=11958.

National Diabetes Statistics Report Centers for Disease Control and Prevention. Available at: https://www.cdc.gov/diabetes/php/data-research/index.html.